31 research outputs found

    A multilingual neural coaching model with enhanced long-term dialogue structure

    Get PDF
    In this work we develop a fully data-driven conversational agent capable of carrying out motivational coach- ing sessions in Spanish, French, Norwegian, and English. Unlike the majority of coaching, and in general well-being related conversational agents that can be found in the literature, ours is not designed by hand- crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop a global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that the system is usable and gives rise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357 and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 769872

    A Differentiable Generative Adversarial Network for Open Domain Dialogue

    Get PDF
    Paper presented at the IWSDS 2019: International Workshop on Spoken Dialogue Systems Technology, Siracusa, Italy, April 24-26, 2019This work presents a novel methodology to train open domain neural dialogue systems within the framework of Generative Adversarial Networks with gradient-based optimization methods. We avoid the non-differentiability related to text-generating networks approximating the word vector corresponding to each generated token via a top-k softmax. We show that a weighted average of the word vectors of the most probable tokens computed from the probabilities resulting of the top-k softmax leads to a good approximation of the word vector of the generated token. Finally we demonstrate through a human evaluation process that training a neural dialogue system via adversarial learning with this method successfully discourages it from producing generic responses. Instead it tends to produce more informative and variate ones.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357, by the University of the Basque Country UPV/EHU under grant PIF17/310, and by the H2020 RIA EMPATHIC (Grant N: 769872)

    Perceptual borderline for balancing multi-class spontaneous emotional data

    Get PDF
    Speech is a behavioural biometric signal that can provide important information to understand the human intends as well as their emotional status. The paper is centered on the speech-based identification of the seniors’s emotional status during their interaction with a virtual agent playing the role of a health professional coach. Under real conditions, we can just identify a small set of task-dependent spontaneous emotions. The number of identified samples is largely different for each emotion, which results in an imbalanced dataset problem. This research proposes the dimensional model of emotions as a perceptual representation space alternative to the generally used acoustic one. The main contribution of the paper is the definition of a perceptual borderline for the oversampling of minority emotion classes in this space. This limit, based on arousal and valence criteria, leads to two methods of balancing the data: the Perceptual Borderline oversampling and the Perceptual Borderline SMOTE (Synthetic Minority Oversampling Technique). Both methods are implemented and compared to state-of-the-art approaches of Random oversampling and SMOTE. The experimental evaluation was carried out on three imbalanced datasets of spontaneous emotions acquired in human-machine scenarios in three different cultures: Spain, France and Norway. The emotion recognition results obtained by neural networks classifiers show that the proposed perceptual oversampling methods led to significant improvements when compared with the state-of-the art, for all scenarios and languages.The research presented in this paper is conducted as partof the project EMPATHIC and of the MENHIR MSCAaction that have received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements No 769872 an No 823907 respective

    Audio Embeddings help to learn better dialogue policies

    Get PDF
    Presentado en ASRU 2021, Cartagena (Colombia), 13-17 diciembre 2021Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users’ audio signal have rarely been ex- plored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a sim- ulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms

    User-Aware Dialogue Management Policies over Attributed Bi-Automata

    Get PDF
    Designing dialogue policies that take user behavior into account is complicated due to user vari- ability and behavioral uncertainty. Attributed Prob- abilistic Finite State Bi-Automata (A-PFSBA) have proven to be a promising framework to develop dia- logue managers that capture the users’ actions in its structure and adapt to them online, yet developing poli- cies robust to high user uncertainty is still challenging. In this paper, the theoretical A-PFSBA dialogue man- agement framework is augmented by formally defining the notation of exploitation policies over its structure. Under such definition, multiple path based policies are implemented, those that take into account external in- formation and those which do not. These policies are evaluated on the Let’s Go corpus, before and after an online learning process whose goal is to update the ini- tial model through the interaction with end-users. In these experiments the impact of user uncertainty and the model structural learning is thoroughly analyzedSpanish Minister of Science under grants TIN2014-54288-C4- 4-R and TIN2017-85854-C4-3-R European Commission H2020 SC1-PM15 EMPATHIC project, RIA grant 69872

    Automatic Identification of Emotional Information in Spanish TV Debates and Human-Machine Interactions

    Get PDF
    Automatic emotion detection is a very attractive field of research that can help build more natural human–machine interaction systems. However, several issues arise when real scenarios are considered, such as the tendency toward neutrality, which makes it difficult to obtain balanced datasets, or the lack of standards for the annotation of emotional categories. Moreover, the intrinsic subjectivity of emotional information increases the difficulty of obtaining valuable data to train machine learning-based algorithms. In this work, two different real scenarios were tackled: human–human interactions in TV debates and human–machine interactions with a virtual agent. For comparison purposes, an analysis of the emotional information was conducted in both. Thus, a profiling of the speakers associated with each task was carried out. Furthermore, different classification experiments show that deep learning approaches can be useful for detecting speakers’ emotional information, mainly for arousal, valence, and dominance levels, reaching a 0.7F1-score.The research presented in this paper was conducted as part of the AMIC and EMPATHIC projects, which received funding from the Spanish Minister of Science under grants TIN2017-85854-C4-3-R and PDC2021-120846-C43 and from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 769872. The first author also received a PhD scholarship from the University of the Basque Country UPV/EHU, PIF17/310

    Detection of Sarcasm and Nastiness: New Resources for Spanish Language

    Get PDF
    The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of SOFOCO (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the users’ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unions’s H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668

    Corrective Focus Detection in Italian Speech Using Neural Networks

    Get PDF
    The corrective focus is a particular kind of prosodic prominence where the speaker is intended to correct or to emphasize a concept. This work develops an Artificial Cognitive System (ACS) based on Recurrent Neural Networks that analyzes suitablefeatures of the audio channel in order to automatically identify the Corrective Focus on speech signals. Two different approaches to build the ACS have been developed. The first one addresses the detection of focused syllables within a given Intonational Unit whereas the second one identifies a whole IU as focused or not. The experimental evaluation over an Italian Corpus has shown the ability of the Artificial Cognitive System to identify the focus in the speaker IUs. This ability can lead to further important improvements in human-machine communication. The addressed problem is a good example of synergies between Humans and Artificial Cognitive Systems.The research leading to the results in this paper has been conducted in the project EMPATHIC (Grant N: 769872) that received funding from the European Union’s Horizon2020 research and innovation programme.Additionally, this work has been partially funded by the Spanish Minister of Science under grants TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R, by the Basque Government under grant PRE_2017_1_0357,andby the University of the Basque Country UPV/EHU under grantPIF17/310

    Speech emotion recognition in Spanish TV Debates

    Get PDF
    Emotion recognition from speech is an active field of study that can help build more natural human-machine interaction systems. Even though the advancement of deep learning technology has brought improvements in this task, it is still a very challenging field. For instance, when considering real life scenarios, things such as tendency toward neutrality or the ambiguous definition of emotion can make labeling a difficult task causing the data-set to be severally imbalanced and not very representative. In this work we considered a real life scenario to carry out a series of emotion classification experiments. Specifically, we worked with a labeled corpus consisting of a set of audios from Spanish TV debates and their respective transcriptions. First, an analysis of the emotional information within the corpus was conducted. Then different data representations were analyzed as to choose the best one for our task; Spectrograms and UniSpeech-SAT were used for audio representation and DistilBERT for text representation. As a final step, Multimodal Machine Learning was used with the aim of improving the obtained classification results by combining acoustic and textual information.The research presented in this paper was conducted as part of the AMIC PdC project, which received funding from the Spanish Ministry of Science under grants TIN2017-85854-C4- 3-R, PID2021-126061OB-C42 and PDC2021-120846-C43 and it was also partially funded by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 823907 (MENHIR)

    Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

    Get PDF
    Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.The research presented in this paper has been conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 769872. Additionally, this work has been partially funded by projects BEWORD and AMIC-PC of the Minister of Science of Technology, under Grant Nos. PID2021-126061OB-C42 and PDC2021-120846-C43, respectively. Vázquez and López Zorrilla received a PhD scholarship from the Basque Government, with Grant Nos. PRE 2020 1 0274 and PRE 2017 1 0357, respectively
    corecore